15 research outputs found

    GASCOM: Graph-based Attentive Semantic Context Modeling for Online Conversation Understanding

    Full text link
    Online conversation understanding is an important yet challenging NLP problem which has many useful applications (e.g., hate speech detection). However, online conversations typically unfold over a series of posts and replies to those posts, forming a tree structure within which individual posts may refer to semantic context from higher up the tree. Such semantic cross-referencing makes it difficult to understand a single post by itself; yet considering the entire conversation tree is not only difficult to scale but can also be misleading as a single conversation may have several distinct threads or points, not all of which are relevant to the post being considered. In this paper, we propose a Graph-based Attentive Semantic COntext Modeling (GASCOM) framework for online conversation understanding. Specifically, we design two novel algorithms that utilise both the graph structure of the online conversation as well as the semantic information from individual posts for retrieving relevant context nodes from the whole conversation. We further design a token-level multi-head graph attention mechanism to pay different attentions to different tokens from different selected context utterances for fine-grained conversation context modeling. Using this semantic conversational context, we re-examine two well-studied problems: polarity prediction and hate speech detection. Our proposed framework significantly outperforms state-of-the-art methods on both tasks, improving macro-F1 scores by 4.5% for polarity prediction and by 5% for hate speech detection. The GASCOM context weights also enhance interpretability

    HateRephrase: Zero- and Few-Shot Reduction of Hate Intensity in Online Posts using Large Language Models

    Full text link
    Hate speech has become pervasive in today's digital age. Although there has been considerable research to detect hate speech or generate counter speech to combat hateful views, these approaches still cannot completely eliminate the potential harmful societal consequences of hate speech -- hate speech, even when detected, can often not be taken down or is often not taken down enough; and hate speech unfortunately spreads quickly, often much faster than any generated counter speech. This paper investigates a relatively new yet simple and effective approach of suggesting a rephrasing of potential hate speech content even before the post is made. We show that Large Language Models (LLMs) perform well on this task, outperforming state-of-the-art baselines such as BART-Detox. We develop 4 different prompts based on task description, hate definition, few-shot demonstrations and chain-of-thoughts for comprehensive experiments and conduct experiments on open-source LLMs such as LLaMA-1, LLaMA-2 chat, Vicuna as well as OpenAI's GPT-3.5. We propose various evaluation metrics to measure the efficacy of the generated text and ensure the generated text has reduced hate intensity without drastically changing the semantic meaning of the original text. We find that LLMs with a few-shot demonstrations prompt work the best in generating acceptable hate-rephrased text with semantic meaning similar to the original text. Overall, we find that GPT-3.5 outperforms the baseline and open-source models for all the different kinds of prompts. We also perform human evaluations and interestingly, find that the rephrasings generated by GPT-3.5 outperform even the human-generated ground-truth rephrasings in the dataset. We also conduct detailed ablation studies to investigate why LLMs work satisfactorily on this task and conduct a failure analysis to understand the gaps

    A Graph-Based Context-Aware Model to Understand Online Conversations

    Full text link
    Online forums that allow for participatory engagement between users have been transformative for the public discussion of many important issues. However, such conversations can sometimes escalate into full-blown exchanges of hate and misinformation. Existing approaches in natural language processing (NLP), such as deep learning models for classification tasks, use as inputs only a single comment or a pair of comments depending upon whether the task concerns the inference of properties of the individual comments or the replies between pairs of comments, respectively. But in online conversations, comments and replies may be based on external context beyond the immediately relevant information that is input to the model. Therefore, being aware of the conversations' surrounding contexts should improve the model's performance for the inference task at hand. We propose GraphNLI, a novel graph-based deep learning architecture that uses graph walks to incorporate the wider context of a conversation in a principled manner. Specifically, a graph walk starts from a given comment and samples "nearby" comments in the same or parallel conversation threads, which results in additional embeddings that are aggregated together with the initial comment's embedding. We then use these enriched embeddings for downstream NLP prediction tasks that are important for online conversations. We evaluate GraphNLI on two such tasks - polarity prediction and misogynistic hate speech detection - and found that our model consistently outperforms all relevant baselines for both tasks. Specifically, GraphNLI with a biased root-seeking random walk performs with a macro-F1 score of 3 and 6 percentage points better than the best-performing BERT-based baselines for the polarity prediction and hate speech detection tasks, respectively.Comment: 25 pages, 9 figures. arXiv admin note: text overlap with arXiv:2202.0817

    Under the Spotlight: Web Tracking in Indian Partisan News Websites

    Full text link
    India is experiencing intense political partisanship and sectarian divisions. The paper performs, to the best of our knowledge, the first comprehensive analysis on the Indian online news media with respect to tracking and partisanship. We build a dataset of 103 online, mostly mainstream news websites. With the help of two experts, alongside data from the Media Ownership Monitor of the Reporters without Borders, we label these websites according to their partisanship (Left, Right, or Centre). We study and compare user tracking on these sites with different metrics: numbers of cookies, cookie synchronizations, device fingerprinting, and invisible pixel-based tracking. We find that Left and Centre websites serve more cookies than Right-leaning websites. However, through cookie synchronization, more user IDs are synchronized in Left websites than Right or Centre. Canvas fingerprinting is used similarly by Left and Right, and less by Centre. Invisible pixel-based tracking is 50% more intense in Centre-leaning websites than Right, and 25% more than Left. Desktop versions of news websites deliver more cookies than their mobile counterparts. A handful of third-parties are tracking users in most websites in this study. This paper, by demonstrating intense web tracking, has implications for research on overall privacy of users visiting partisan news websites in India

    Area and mass changes of Siachen Glacier (East Karakoram)

    Get PDF
    The authors thank the European Space Agency for providing the Envisat data under the AOE 668 project. T. Bolch and T. Strozzi acknowledge funding by the European Space Agency (ESA) within the Glaciers_cci project (code 4000109873/14/I-NB).Here, we present a comprehensive assessment of Siachen Glacier (East Karakoram), in terms of its area and elevation change, velocity variations and mass budget, utilizing different satellite datasets including Landsat, Hexagon, Cartosat-I, Shuttle Radar Topography Mission, Envisat Advanced Synthetic Aperture Radar and Japanese Advanced Land Observing Satellite Phased Array-type L-band SAR. The total areal extent of Siachen Glacier did not change significantly between 1980 and 2014; however the exposed-ice area decreased during that period. The terminus of the glacier has experienced substantial downwasting (on average 30 m) over the period of 1999-2007, followed by a retreat of the transition between exposed and debris-covered ice by a distance of 1.3 km during the short span 2007-14. The spatial patterns of the elevation difference and velocity are heterogeneous over the large areal extent of Siachen Glacier. The average velocity of the entire glacier, as computed between 11 December 2008 and 26 January 2009, was 12.3 ± 0.4 cm d -1 , while those estimated separately for the accumulation and ablation regions were 9.7 ± 0.4 cm d -1 and 20.4 ± 0.4 cm d -1 , respectively. The mass budget of Siachen Glacier is estimated to be -0.03 ± 0.21 m w.e. a -1 for the period of 1999-2007.Publisher PDFPeer reviewe

    Synchronization of Boron application methods and rates is environmentally friendly approach to improve quality attributes of Mangifera indica L. on sustainable basis

    Get PDF
    Micronutrient deficiency in the soil is one of the major causes of mango fruit and yield's poor quality. Besides, the consumption of such a diet also causes a deficiency of micronutrients in humans. Boron deficiency adversely affects the flowering and pollen tube formation, thus decreasing mango yield and quality attributes. Soil and foliar application of B are considered a productive method to alleviate boron deficiency. A field experiment was conducted to explore the Boron most suitable method and application rate in mango under the current climatic scenario. There were nine treatments applied in three replications. The results showed that application of T8 = RD + Borax (75 g plant -1 as a basal application) + H3 BO3 (0.8% as a foliar spray) and T9 = RD + Borax (150 g plant -1 as a basal application) + H3 BO3 (0.8% as a foliar spray) significantly enhanced the nitrogen, potassium, proteins, ash, fats, fiber, and total soluble solids in mango as compared to the control. A significant decrease in sodium, total phenolics contents, antioxidant activity, and acidity as citric acid also validated the effective functioning of T8 = RD + Borax (75 g plant -1 as a basal application) + H3 BO3 (0.8% as a foliar spray) and T9 = RD + Borax (150 g plant -1 as a basal application) + H3 BO3 (0.8% as a foliar spray) as compared to control. In conclusion, T8 = RD + Borax (75 g plant -1 as a basal application) + H3 BO3 (0.8% as a foliar spray) and T9 = RD + Borax (150 g plant -1 as a basal application) + H3 BO3 (0.8% as a foliar spray) is a potent strategy to improve the quality attributes of mango under the changing climatic situation

    AnnoBERT: Effectively Representing Multiple Annotators’ Label Choices to Improve Hate Speech Detection

    No full text
    Supervised machine learning approaches often rely on a "ground truth" label. However, obtaining one label through majority voting ignores the important subjectivity information in tasks such hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integrating annotator characteristics and label text with a transformer-based model to detect hate speech, with unique representations based on each annotator's characteristics via Collaborative Topic Regression (CTR) and integrate label text to enrich textual representations. During training, the model associates annotators with their label choices given a piece of text; during evaluation, when label information is not available, the model predicts the aggregated label given by the participating annotators by utilising the learnt association. The proposed approach displayed an advantage in detecting hate speech, especially in the minority class and edge cases with annotator disagreement. Improvement in the overall performance is the largest when the dataset is more label-imbalanced, suggesting its practical value in identifying real-world hate speech, as the volume of hate speech in-the-wild is extremely small on social media, when compared with normal (non-hate) speech. Through ablation studies, we show the relative contributions of annotator embeddings and label text to the model performance, and tested a range of alternative annotator embeddings and label text combinations
    corecore